AITopics | federated evaluation

Collaborating Authors

federated evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom

He, Yuanqin, Kang, Yan, Fan, Lixin, Yang, Qiang

arXiv.org Artificial IntelligenceApr-18-2024

Federated Learning (FL) has emerged as a promising solution for collaborative training of large language models (LLMs). However, the integration of LLMs into FL introduces new challenges, particularly concerning the evaluation of LLMs. Traditional evaluation methods that rely on labeled test sets and similarity-based metrics cover only a subset of the acceptable answers, thereby failing to accurately reflect the performance of LLMs on generative tasks. Meanwhile, although automatic evaluation methods that leverage advanced LLMs present potential, they face critical risks of data leakage due to the need to transmit data to external servers and suboptimal performance on downstream tasks due to the lack of domain knowledge. To address these issues, we propose a Federated Evaluation framework of Large Language Models, named FedEval-LLM, that provides reliable performance measurements of LLMs on downstream tasks without the reliance on labeled test sets and external tools, thus ensuring strong privacy-preserving capability. FedEval-LLM leverages a consortium of personalized LLMs from participants as referees to provide domain knowledge and collective evaluation capability, thus aligning to the respective downstream tasks and mitigating uncertainties and biases associated with a single referee. Experimental results demonstrate a significant improvement in the evaluation capability of personalized evaluation models on downstream tasks. When applied to FL, these evaluation models exhibit strong agreement with human preference and RougeL-score on meticulously curated test sets. FedEval-LLM effectively overcomes the limitations of traditional metrics and the reliance on external services, making it a promising framework for the evaluation of LLMs within collaborative training scenarios.

evaluation, evaluation capability, evaluation model, (15 more...)

arXiv.org Artificial Intelligence

2404.12273

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Survey of Federated Evaluation in Federated Learning

Soltani, Behnaz, Zhou, Yipeng, Haghighi, Venus, Lui, John C. S.

arXiv.org Artificial IntelligenceMay-19-2023

In traditional machine learning, it is trivial to conduct model evaluation since all data samples are managed centrally by a server. However, model evaluation becomes a challenging problem in federated learning (FL), which is called federated evaluation in this work. This is because clients do not expose their original data to preserve data privacy. Federated evaluation plays a vital role in client selection, incentive mechanism design, malicious attack detection, etc. In this paper, we provide the first comprehensive survey of existing federated evaluation methods. Moreover, we explore various applications of federated evaluation for enhancing FL performance and finally present future research directions by envisioning some challenges.

artificial intelligence, local model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.0807

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Virginia (0.04)
Europe > France (0.04)
(3 more...)

Genre: Overview (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

Karargyris, Alexandros, Umeton, Renato, Sheller, Micah J., Aristizabal, Alejandro, George, Johnu, Bala, Srini, Beutel, Daniel J., Bittorf, Victor, Chaudhari, Akshay, Chowdhury, Alexander, Coleman, Cody, Desinghu, Bala, Diamos, Gregory, Dutta, Debo, Feddema, Diane, Fursin, Grigori, Guo, Junyi, Huang, Xinyuan, Kanter, David, Kashyap, Satyananda, Lane, Nicholas, Mallick, Indranil, Mascagni, Pietro, Mehta, Virendra, Natarajan, Vivek, Nikolov, Nikola, Padoy, Nicolas, Pekhimenko, Gennady, Reddi, Vijay Janapa, Reina, G Anthony, Ribalta, Pablo, Rosenthal, Jacob, Singh, Abhishek, Thiagarajan, Jayaraman J., Wuest, Anna, Xenochristou, Maria, Xu, Daguang, Yadav, Poonam, Rosenthal, Michael, Loda, Massimo, Johnson, Jason M., Mattson, Peter

arXiv.org Artificial IntelligenceDec-28-2021

Medical AI has tremendous potential to advance healthcare by supporting the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving provider and patient experience. We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data. To meet this need, we are building MedPerf, an open framework for benchmarking machine learning in the medical domain. MedPerf will enable federated evaluation in which models are securely distributed to different facilities for evaluation, thereby empowering healthcare organizations to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status, and our roadmap. We call for researchers and organizations to join us in creating the MedPerf open benchmarking platform.

federated evaluation, medical artificial intelligence, open benchmarking platform, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s42256-023-00652-2

2110.01406

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Federated Evaluation of On-device Personalization

Wang, Kangkang, Mathews, Rajiv, Kiddon, Chloé, Eichner, Hubert, Beaufays, Françoise, Ramage, Daniel

arXiv.org Machine LearningOct-22-2019

Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yields desirable models. We report on our experiments personalizing a language model for a virtual keyboard for smartphones with a population of tens of millions of users. We show that a significant fraction of users benefit from personalization.

accuracy improvement, global model, personalization, (15 more...)

arXiv.org Machine Learning

1910.10252

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback